199 research outputs found

    Improving polyphonic and poly-instrumental music to score alignment

    Get PDF
    6ppInternational audienceMusic alignment links events in a score and points on the audio performance time axis. All the parts of a recording can be thus indexed according to score information. The automatic alignment presented in this paper is based on a dynamic time warping method. Local distances are computed using the signal's spectral features through an attack plus sustain note modeling. The method is applied to mixtures of harmonic sustained instruments, excluding percussion for the moment. Good alignment has been obtained for polyphony of up to five instruments. The method is robust for difficulties such as trills, vibratos and fast sequences. It provides an accurate indicator giving position of score interpretation errors and extra or forgotten notes. Implementation optimizations allow aligning long sound files in a relatively short time. Evaluation results have been obtained on piano jazz recordings

    Stylization and Trajectory Modelling of Short and Long Term Speech Prosody Variations

    Get PDF
    International audienceIn this paper, a unified trajectory model based on the stylization and the modelling of f0 variations simultaneously over various temporal domains is proposed. The syllable is used as the minimal temporal domain for the description of speech prosody, and short-term and long-term f0 variations are stylized and modelled simultaneously over various temporal domains. During the training, a context-dependent model is estimated according to the joint stylized f0 contours over the syllable and a set of long-term temporal domains. During the synthesis, f0 variations are determined using the long-term variations as trajectory constraints. In a subjective evaluation in speech synthesis, the stylization and trajectory modelling of short and long term speech prosody variations is shown to consistently model speech prosody and to outperform the conventional short-term modelling

    A Multi-Level Context-Dependent Prosodic Model applied to duration modeling

    Get PDF
    International audienceon the estimation of prosodic parameters on a set of well defined linguistic units. Different linguistic units are used to represent different scales of prosodic variations (local and global forms) and thus to estimate the linguistic factors that can explain the variations of prosodic parameters independently on each level. This model is applied to the modeling of syllablebased durational parameters on two read speech corpora - laboratory and acted speech. Compared to a syllable-based baseline model, the proposed approach improves performance in terms of the temporal organization of the predicted durations (correlation score) and reduces model's complexity, when showing comparable performance in terms of relative prediction error. Index Terms : speech synthesis, prosody, multi-level model, context-dependent model

    R\'enyi Information Measures for Spectral Change Detection

    Full text link
    Change detection within an audio stream is an important task in several domains, such as classification and segmentation of a sound or of a music piece, as well as indexing of broadcast news or surveillance applications. In this paper we propose two novel methods for spectral change detection without any assumption about the input sound: they are both based on the evaluation of information measures applied to a time- frequency representation of the signal, and in particular to the spectrogram. The class of measures we consider, the R\'enyi entropies, are obtained by extending the Shannon entropy definition: a biasing of the spectrogram coefficients is realized through the dependence of such measures on a parameter, which allows refined results compared to those obtained with standard divergences. These methods provide a low computational cost and are well-suited as a support for higher level analysis, segmentation and classification algorithms.Comment: 2011 IEEE Conference on Acoustics, Speech and Signal Processin

    Prosodic control of unit-selection speech synthesis: A probabilistic approach

    Get PDF

    The Importance of Cross Database Evaluation in Sound Classification

    Get PDF
    In numerous articles (Martin and Kim, 1998; Fraser and Fujinaga, 1999; and many others) sound classification algorithms are evaluated using "self classification" - the learning and test groups are randomly selected out of the same sound database. We will show that "self classification" is not necessarily a good statistic for the ability of a classification algorithm to learn, generalize or classify well. We introduce the alternative "Minus-1 DB" evaluation method and demonstrate that it does not have the shortcomings of "self classification"

    Analysis of Sound Signals with High Resolution Matching Pursuit

    No full text
    International audienceSound recordings include transients and sustained parts. Their analysis with a basis expansion is not rich enough to represent efficiently all such components. Pursuit algorithms choose the decomposition vectors depending upon the signal properties. The dictionary among which these vectors are selected is much larger than a basis. Matching pursuit is fast to compute, but can provide coarse representations. Basis pursuit gives a better representation but is very expensive in terms of calculation time. This paper develops a high resolution matching pursuit: it is a fast, high time-resolution, time-frequency analysis algorithm, that makes it likely to be used far musical application

    IrcamCorpusTools: an Extensible Platform for Spoken Corpora Exploitation

    Get PDF
    Corpus based methods are increasingly used for speech technology applications and for the development of theoretical or computer models of spoken languages. These usages range from unit selection speech synthesis [1] to statistical modeling of speech phenomena like prosody or expressivity [2]. In all cases, these usages require a wide range of tools for corpus creation, labeling

    Phase Minimization for Glottal Model Estimation

    Full text link
    • …
    corecore